Handling missing probe-sets in a linear gene classifier

نویسنده

  • Rowan Kuiper
چکیده

In multiple myeloma (a type of cancer) linear models (e.g. EMC92) can be used to estimate a patients’ prognosis. In case the model outcome exceeds a specific value t (i.e. a dichotomizing threshold) the subject is classified as highrisk. These linear models consist of a number of covariates, each contributing to the model outcome. In the case of EMC92 or UAMS70, these covariates are probe sets. This vignette deals with the situation that not all covariates are available for generating the model outcome in independent data. The method is based on redistributing the weights of the discarded covariates over the remaining covariates based on the covariance structure in the training data of that model, i.e. a reweighted model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: The good synergy between RBFNs and EventCovering method

The presence of Missing Values in a data set can affect the performance of a classifier constructed using that data set as a training sample. Several methods have been proposed to treat missing data and the one used more frequently is the imputation of the Missing Values of an instance. In this paper, we analyze the improvement of performance on Radial Basis Function Networks by means of the us...

متن کامل

مقایسه روش الگوریتم EM و روش‌های متداول جانهی داده‌های گمشده: مطالعه‌روی پرسشنامه خوددرمانی بیماران دیابتی

Background and Objectives: Missing data is a big challenge in the research. According to the type of the study and of the variables, different ways have been proposed to work with these data. This study compared five popular imputation approaches in addressing missing data in the questionnaires. Methods: In this study, 500 questionnaires were used for self-medication in diabetic patients. Mi...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017